12/18/2017

Starbucks Brand Perceptions

Starbucks Corporation, founded in 1971, has been long considered the main representative of "second wave coffee", distinguishing itself from other coffee-serving venues in the US by taste, quality and customer experience.Their success and customers' addiction with the brand have made many scholars wonder the reason.

Some scholars have argued that the popularity of Starbucks is probably not due to its quality but because of customer's buying experience and its marketing strategy.

Using Twitter

Twitter is an online news and social networking service where users post and interact with messages, called "tweets."

As of 2016, Twitter had more than 319 million monthly active users. 8% followers of Twitter acounts are publicly visible and can be programmatically assessed through Twitter's Application Programming Interface.

Therefore, in this project, I am going to use Twitter as the plant form to assess Starbucks brand images.

Three hashtags about Starbucks

In order to figure out what Internet users, specifically Twitter users, are talking about Starbucks and how they are feeling about Starbucks brands, I planned to collect tweets including the following hashtags:

  • "#starbucks or @starbucks" (2000 sample tweets)

  • "#starbucksforlife" (1000 tweets)

  • "#starbucksathome" (1000 tweets)

  • "#givegood" (1000 tweets)

Distribution Table of Frequent Words

To have a general understanding on what are the most popular words that people use in tweets to express regard Starbucks brands and its marketing, I obtain histograms to show the frquent words and also wordclouds under each topic.

Distribution Table of Frequent Words – #Starbucks

Distribution Table of Frequent Words #StarbucksForLife

Distribution Table of Frequent Words #StarbucksAtHome

Distribution Table of Frequent Words #StarbucksGiveLife

Word Cloud

Then let us plot some word clouds

Word Cloud for #Starbucks

Word Cloud for #StarbucksForLife

Word Cloud for #StarbucksAtHome

Word Cloud for #StarbucksGiveGood

Statistical Analysis

In this part, I am going to answer the following statistical questions by using modelling:

  • "Whether people hold a negative attitude toward Starbucks three marketing hashtags?"

  • "What is the relationship between users' sentiment of tweets and the tweets' popularity?"

  • "Whether there is a statistical difference between sentiment scores among those three different marketing hashtags and between retweet numbers?"

Before answering those questions, let's look at the tweets' sentiment scores' distribution:

Sentiment Scores' Distribution for #Starbucks

Sentiment Scores' Distribution for #StarbucksForLife

Sentiment Scores' Distribution for #StarbucksAtHome

Sentiment Scores' Distribution for #StarbucksGiveGood

Hypothesis Testing

Null Hypothesis: People have a negative attitude toward Starbucks For Life. (i.e. Score = -1)

## [1] 94.36107
## [1] 5.346221e-300

Hypothesis Testing

Null Hypothesis: People have a negative attitude toward Starbucks At Home.

## [1] 175.6928
## [1] 1.185907e-237

Hypothesis Testing

Null Hypothesis: People have a negative attitude toward Starbucks Give Good.

## [1] 38.41929
## [1] 2.588569e-120

analyze the relationship between sentiment and retweet number

##              Df    Sum Sq   Mean Sq F value Pr(>F)    
## total$score   1 150486101 150486101   510.6 <2e-16 ***
## Residuals   985 290295938    294717                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Generate a linear Model

## 
## Call:
## lm(formula = total$retweet_count ~ total$score)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3049.9  -357.7  -357.7   649.8  4336.8 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -180.72      31.40  -5.755 1.15e-08 ***
## total$score   538.44      23.83  22.597  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 542.9 on 985 degrees of freedom
## Multiple R-squared:  0.3414, Adjusted R-squared:  0.3407 
## F-statistic: 510.6 on 1 and 985 DF,  p-value: < 2.2e-16

Plot the Linear Regression

## Warning: Removed 24 rows containing missing values (geom_smooth).

Get the Summary of Total Score for three hashtags

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    -2.0     1.0     1.0     1.1     2.0     6.0

Get the Summary of Total Retweet Numbers

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     0.0     0.0   411.7   600.0  5233.0

Get the Summary of Total Favorite Count

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.0000  0.0000  0.3597  0.0000 83.0000

Look At the linear relationship

##                     Df    Sum Sq   Mean Sq F value Pr(>F)    
## factor(d4$hashtag)   2 349183958 174591979    1876 <2e-16 ***
## Residuals          984  91598081     93087                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call:
## aov(formula = total$retweet_count ~ factor(d4$hashtag), data = total)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1511.0   -16.5   -16.5    35.1  5020.6 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)                          1510.95      20.57   73.45   <2e-16
## factor(d4$hashtag)starbucksforlife  -1494.46      25.06  -59.63   <2e-16
## factor(d4$hashtag)starbucksgivegood -1298.58      26.84  -48.38   <2e-16
##                                        
## (Intercept)                         ***
## factor(d4$hashtag)starbucksforlife  ***
## factor(d4$hashtag)starbucksgivegood ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 305.1 on 984 degrees of freedom
## Multiple R-squared:  0.7922, Adjusted R-squared:  0.7918 
## F-statistic:  1876 on 2 and 984 DF,  p-value: < 2.2e-16

Compare the score mean difference between the three hashtag groups

(i.e. "Starbucks For Life","Starbucks At Home", "Starbucks Give Good")

##                     Df Sum Sq Mean Sq F value Pr(>F)    
## factor(d4$hashtag)   2  215.5  107.76   349.3 <2e-16 ***
## Residuals          984  303.6    0.31                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Call:
## aov(formula = total$score ~ factor(d4$hashtag), data = total)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -2.9163  0.0364  0.0837  0.0837  5.2396 
## 
## Coefficients:
##                                     Estimate Std. Error t value Pr(>|t|)
## (Intercept)                          1.96364    0.03745   52.44   <2e-16
## factor(d4$hashtag)starbucksforlife  -1.04734    0.04563  -22.95   <2e-16
## factor(d4$hashtag)starbucksgivegood -1.20325    0.04887  -24.62   <2e-16
##                                        
## (Intercept)                         ***
## factor(d4$hashtag)starbucksforlife  ***
## factor(d4$hashtag)starbucksgivegood ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.5554 on 984 degrees of freedom
## Multiple R-squared:  0.4152, Adjusted R-squared:  0.414 
## F-statistic: 349.3 on 2 and 984 DF,  p-value: < 2.2e-16

III. Mapping

In order to visualize the sentiment scores in the context of location distributions of twitter users who tweet on those three topics. I am going to map based on the sentiment scores. The positive scores would be indicated by the red points and the negative scores would be indicated by the blue points. The absolute value of the sentiment scores would be shown by the size of the points on the map.

First, I created a U.S. map with the sentiment scores of all three hashtags. The map shows that the data are mostly scattered around California and East Coast. The map are generally covered by the red point, which means that customers hold a positive attitude toward Starbucks when they talk about those three topics. There are few blue point show on Inidana and Massachusetts. Thus, I will plot each topic's sentiment scores on the second U.S. map to see which topic has the largest number of negative scores.

Mapping with Sentiment Scores on US map

Mapping under Topic "Starbucks for Life"

Mapping under Topic "starbucks at Home"

Mapping under Topic "Starbucks give Good"

Sentiment across different locations

Los Angeles Sentiment Scores

Penn

Penn State Sentiment Scores

Massachusettes

New York

Finding from Maps

  • Blue points showing on Los Angeles Map
  • Blue points showing on Peen and Massachussetes Maps
  • New York sentiment scores are much higher (almost no blue point)
  • The sample size of data with local information is so small that there are too little points plotted on the map to reach any firm conclusion.

IV. Shiny

  • A navbar with all the graphs included

  • An interactive map to compare the retweet tweets in different regions

Navbar

Interactive Map

V. Future Improvements

The initial sample I have chosen is 5000 tweets data. However, after selecting all the useful tweets with valid locations and english as the language, there are only around 2000 tweets available. Therefore, the sample size is too small to reach firm conclusions. In order to reach a firm conclusion about the Starbucks marketing perceptions, we should explore more data sets and not be limitted only by twitter and Google.

Also, there are definitely more marketing strategies used by Starbucks, for example, sustainability, international brand images, working environment and so on. Hence, I look forward to exploring more about customers' perceptions on those strategies in different platform and area. Please feel free to email me (xuexq@bu.edu) if you have any questions. Thank you for going through this exploration with me!

End